Add action-agent runtime and generated config pipeline#306
Add action-agent runtime and generated config pipeline#306skywhite1024 wants to merge 33 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds the first executable “action-agent” runtime slice for running already-generated agent/gym configs end-to-end: loading/validating nominal task graphs, normalizing atomic-action specs, executing them in the sim (with a tableware env adapter), plus a CLI and unit tests.
Changes:
- Introduce a nominal task-graph runtime (
AgentTaskGraph) and JSON bundle compiler/validator. - Add an atomic-action spec normalization + execution bridge to
embodichain.lab.sim.atomic_actions. - Provide a tableware agent env adapter and a
run_agent.pyCLI, with unit tests for usage tracking, schema validation, and atomic runtime integration.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/gen_sim/action_agent_pipeline/test_llm_usage.py | Unit tests for LLM usage normalization/recording and wrapper behavior. |
| tests/gen_sim/action_agent_pipeline/test_graph_spec_backend_atomic.py | Unit tests for graph compilation accepting/rejecting action schemas. |
| tests/gen_sim/action_agent_pipeline/test_backend_atomic_runtime.py | Unit tests for atomic-action spec normalization and runtime execution bridge. |
| embodichain/gen_sim/action_agent_pipeline/utils/mllm.py | OpenAI/LangChain client factory with usage tracking wrapper. |
| embodichain/gen_sim/action_agent_pipeline/utils/llm_usage.py | JSONL usage logging + aggregation utilities and LangChain proxy wrapper. |
| embodichain/gen_sim/action_agent_pipeline/utils/llm_json.py | Helpers to extract/normalize JSON objects from LLM responses. |
| embodichain/gen_sim/action_agent_pipeline/utils/llm_config.py | Shared LLM config loading from env + gen_config + local .env files. |
| embodichain/gen_sim/action_agent_pipeline/utils/init.py | Package init for utils module. |
| embodichain/gen_sim/action_agent_pipeline/runtime/task_graph.py | Deterministic nominal graph runtime and executed-action list wrapper. |
| embodichain/gen_sim/action_agent_pipeline/runtime/graph_compiler.py | Loader/validator/compiler for nominal task graphs into runtime objects. |
| embodichain/gen_sim/action_agent_pipeline/runtime/atom_actions.py | Atomic-action spec normalization and execution, incl. parallel arm execution. |
| embodichain/gen_sim/action_agent_pipeline/runtime/atom_action_utils.py | Runtime helpers for arm-side resolution and agent-state synchronization. |
| embodichain/gen_sim/action_agent_pipeline/runtime/init.py | Package init for runtime module. |
| embodichain/gen_sim/action_agent_pipeline/prompts/task_prompt.py | Task-agent prompt builder for nominal atomic-action graphs. |
| embodichain/gen_sim/action_agent_pipeline/prompts/basic_background.txt | Shared environment background prompt text. |
| embodichain/gen_sim/action_agent_pipeline/prompts/atom_actions.txt | Shared atomic-action JSON schema prompt text. |
| embodichain/gen_sim/action_agent_pipeline/prompts/init.py | Package init exporting TaskPrompt. |
| embodichain/gen_sim/action_agent_pipeline/env_adapters/tableware/success.py | Config-driven success predicate evaluation for tableware tasks. |
| embodichain/gen_sim/action_agent_pipeline/env_adapters/tableware/base_agent_env.py | Tableware agent env adapter wiring Task/Compile agents and state caching. |
| embodichain/gen_sim/action_agent_pipeline/env_adapters/tableware/atomic_actions.py | Gym env registration for the atomic-actions agent environment. |
| embodichain/gen_sim/action_agent_pipeline/env_adapters/tableware/init.py | Package init for tableware env adapters. |
| embodichain/gen_sim/action_agent_pipeline/env_adapters/init.py | Package init for env adapters. |
| embodichain/gen_sim/action_agent_pipeline/cli/run_agent.py | CLI entrypoint to execute action-agent tasks from existing configs. |
| embodichain/gen_sim/action_agent_pipeline/cli/init.py | Package init for CLI module. |
| embodichain/gen_sim/action_agent_pipeline/agents/task_agent.py | TaskAgent wrapper to generate and persist the nominal task graph. |
| embodichain/gen_sim/action_agent_pipeline/agents/llm.py | Shared LLM construction (safe initialization + usage-stage tagging). |
| embodichain/gen_sim/action_agent_pipeline/agents/compile_agent.py | CompileAgent wrapper to persist compiled graph bundles and execute them. |
| embodichain/gen_sim/action_agent_pipeline/agents/agent_base.py | Shared agent base for prompt file resolution/loading. |
| embodichain/gen_sim/action_agent_pipeline/agents/init.py | Package init for agents module. |
| embodichain/gen_sim/action_agent_pipeline/init.py | Package init for action-agent pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if edge.get("left_arm_action") is None and edge.get("right_arm_action") is None: | ||
| raise ValueError(f"Nominal edge '{edge_id}' must define an arm action.") |
| def _pose(env, uid: str) -> torch.Tensor: | ||
| return env.sim.get_rigid_object(uid).get_local_pose(to_matrix=True) |
| def extract_drive_calls(code_str: str) -> List[str]: | ||
| """Extract all drive() function calls from a code string. | ||
|
|
||
| Args: | ||
| code_str: Python code string to parse. |
| runtime_kwargs = _runtime_kwargs(kwargs, getattr(self, "prompt_kwargs", {})) | ||
| graph = compile_agent_graph_from_file(graph_file_path) | ||
| result = graph.run(**runtime_kwargs) | ||
| print("Compiled agent graph executed successfully.") | ||
| return result |
|
3cff8d8 to
7412fc1
Compare
|
Thanks for the detailed review. I updated the PR branch and addressed the action-agent cleanup items:
Validation in conda run -n embodichain040 python -m pytest \
tests/gen_sim/action_agent_pipeline/test_backend_atomic_runtime.py \
tests/gen_sim/action_agent_pipeline/test_graph_spec_backend_atomic.py \
tests/gen_sim/action_agent_pipeline/test_llm_usage.py \
-qResult: I also updated the PR description because the PR branch has now been synced to the current local branch, so the scope is broader than the original runtime-only slice and includes generated config / demo pipeline support. |
Code reviewBranch: 17 commits, 57 files, +17,269 LOC. The PR adds the action-agent runtime + generated-config pipeline for image-to-tabletop manipulation. The cleanup items from the prior review (duplicate atomic-actions, Good design patterns
Points to improveCorrectness / robustness
EmbodiChain/embodichain/gen_sim/action_agent_pipeline/runtime/atom_actions.py Lines 956 to 958 in 24901bd
EmbodiChain/embodichain/gen_sim/action_agent_pipeline/runtime/atom_actions.py Lines 436 to 440 in 24901bd
EmbodiChain/embodichain/gen_sim/action_agent_pipeline/utils/mllm.py Lines 39 to 46 in 24901bd Structure / contracts
EmbodiChain/embodichain/gen_sim/action_agent_pipeline/runtime/atom_actions.py Lines 487 to 491 in 24901bd
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Module structure:
|
| File | agents/hierarchy/ |
gen_sim/.../agents/ |
Verdict |
|---|---|---|---|
agent_base.py |
AgentBase ABC + _resolve_prompt_path (95 lines) |
AgentBase ABC + _resolve_prompt_path (95 lines) |
Near-verbatim duplicate. Only diffs: gen_sim adds from __future__ import annotations + PEP 604 union, and get_composed_observations() drops the env.get_obs_for_agent() coupling. _resolve_prompt_path is byte-identical. |
task_agent.py |
Emits free-text plan → agent_generated_plan.txt |
Emits JSON task graph → agent_task_graph.json |
Same skeleton (constructor, generate() flow, act() pass-through), different output contract. |
llm.py |
Azure factory + 4 module-level instances | OpenAI factory via shared utils/mllm.py + 1 instance |
Redundant factory. Gen_sim already factored the shared factory out to utils/mllm.py; hierarchy still hardcodes Azure. |
code_agent.py |
LLM writes Python → exec() with AST kwargs injection (289 lines) |
— | Unique to old design; replaced by CompileAgent + graph_compiler. |
validation_agent.py |
Vision-LLM execution validator + view selector (241 lines) | — | Unique to old design; new pipeline uses declarative env_adapters/.../success.py predicates instead. |
External usage
agents/hierarchy/is imported only fromlab/gym/envs/tasks/tableware/base_agent_env.py:23-26(old tableware task).gen_sim/.../agents/is imported only fromgen_sim/action_agent_pipeline/env_adapters/tableware/base_agent_env.py:30-36(new gen_sim path).
There's a parallel old/new split at the env-adapter layer too. Note agents/hierarchy/code_agent.py:216 imports embodichain.lab.sim.atom_actions (singular, old) while the new pipeline uses embodichain.lab.sim.atomic_actions/ (plural package, new) — same generational split exists at the sim layer.
Improvement options
- Finish the migration, delete
agents/hierarchy/. Migrate the one consumer (lab/gym/envs/tasks/tableware/base_agent_env.py) to the gen_sim pipeline, then removeagents/hierarchy/entirely.ValidationAgent(vision-LLM success checking) has no replacement, so either port it toenv_adapters/.../success.pyor accept losing that capability. - Extract shared base to
agents/hierarchy/_base.py, have both pipelines inherit. Removes theAgentBase+_resolve_prompt_path+ LLM-factory duplication. Tradeoff: introduces cross-package coupling (gen_sim would depend onagents/hierarchy/or vice versa); only worth it if Option 1 is blocked. - Leave as-is, document the split. Add a deprecation docstring at the top of
agents/hierarchy/marking it as superseded-by-gen_sim.
Recommendation: Option 1 if the old tableware task is still actively used; Option 3 as a holding pattern if it's already abandoned. Option 2 is a trap — it locks in the duplication by making it structural rather than temporary.
The real question for the PR author: is the old code_agent (LLM-writes-Python) path still needed for anything the JSON-graph path can't do yet? If not, finish the migration in a follow-up and delete agents/hierarchy/.
🤖 Generated with Claude Code
Thanks for the thorough review. I addressed the cleanup items in the latest update:
|
fb5165f to
6be53a6
Compare
turn 180
91e7811 to
bbc85e7
Compare
Thanks for calling this out. I finished the migration and removed the legacy Python-code-generation action-agent path.
|
yuecideng
left a comment
There was a problem hiding this comment.
Thanks for the refactor — the new action-agent runtime and generated-config pipeline is a solid architectural improvement. It removes arbitrary code execution, adds deterministic graph compilation, and the test coverage is good.
However, I consolidated reviews from several focused passes and there are a number of issues that should be addressed before merging. I left inline comments on the critical ones; the rest are summarized below.
Critical (must fix)
CompileAgentbypassesAgentBase.__init__(agents/compile_agent.py:43) — skipsconfig_dirresolution and prompt preloading.CompileAgent.get_composed_observationsbreaks Liskov substitution (agents/compile_agent.py:96) — returnsdict(kwargs)instead of merged prompt contents.- Infinite retry in
estimate_real_dimensions(gym_project_api/prompt2geometry/dimensions.py:74) —max_attempts=Nonehangs on persistent LLM failures. - Destructive
_cleanup_output_root(gym_project_api/prompt2geometry/pipeline.py:424) — can delete arbitrary directory contents and follow symlinks outside the target. - Untyped
cfgdefault inAgenticGenSimEnv.__init__(env_adapters/tableware/agent_env.py:43) — should beEmbodiedEnvCfg | None. - Agent config keys indexed before validation (
env_adapters/tableware/agent_env.py:63) — raisesKeyErrorinstead ofValueError. - Silent
num_envs == 1assumption (env_adapters/tableware/agent_env.py:117) —.squeeze(0)will break batched envs. - False-positive lifted success (
env_adapters/tableware/success.py:222) — missing initial height falls back to current pose. - GLTF
normalizedaccessor ignored (generation/mesh_bounds.py:370) — corrupts bounds for quantized meshes. - MD5 cache key (
generation/coacd_cache.py:56) — disallowed by many security baselines; use SHA-256.
Important (should fix)
Core / agents
- Fix
agents/__init__.py__all__to export classes (AgentBase,TaskAgent,CompileAgent,create_llm), not module names. - Replace
assertwith explicit exceptions and markgenerate/actas@abstractmethodinAgentBase. - Add type hints to
agents/llm.pyand untyped public functions inruntime/atom_actions.py. - Fix
_prepare_grasp_collision_cache_from_env_coacdswallowing broad exceptions. - Harden
coacd_cache_bridgepickle cache (shared writable pickle is a code-execution vector). - Reuse
MotionGeneratoracross atomic actions instead of creating one per call.
CLI / API
- Fix
"None"API-key string inprompt2geometry/config.py(str(None)becomes"None"). - Fix
target_replacement1/2help text vs. selection logic mismatch and unifynargssemantics between pipeline and standalone generator. - Expose or document missing flags in
pipeline_runner.py. - Fix
run_agent.pypassinginttoaction_sentence. - Validate URL scheme in
image2tabletop_client.py. - Type
pipeline_records.pyandtarget_replacements.pyproperly.
Generation
- Consolidate duplicate
_make_relative_events_config/_make_arrangement_events_config. - Extract shared arrangement/stacking bundle builder.
- Derive observation
joint_idsfrom robot config (currently hard-coded DualUR5). - Distinguish mesh load errors from missing dependencies in
_load_mesh_vertices. - Derive repo root from
gym_config_path, notPath.cwd(). - Unify basket bundle validator with shared UID checks.
- Skip texture extraction when normalized OBJ cache is valid.
- Use
dataclasses.replacein_replace_relative_spec_placements.
Tests
- Enable/document opt-in simulation integration tests in CI.
- Move success-predicate tests from
test_ur5_basket_config_generation.pytotest_tableware_success.pyand add failure cases. - Add a real dual-arm
execute_parallel_atomic_actionstest. - Split the ~3,500-line
test_ur5_basket_config_generation.py.
Conventions / minor
- Replace
print()calls with the project logger. - Fix the license header in
rearrangement.py. - Move shebangs below license headers.
- Add missing docstrings to public builder functions.
Recommendation
Address the critical inline comments first, then the important items. Once the critical and most important issues are fixed and tests remain green, this should be ready for another review pass.
| query_suffix = "." | ||
| prompt_kwargs: dict[str, dict[str, Any]] | ||
|
|
||
| def __init__(self, **kwargs) -> None: |
There was a problem hiding this comment.
CompileAgent.init bypasses AgentBase.init, so config_dir resolution and prompt preloading are skipped. Start with super().__init__(**kwargs) and remove the duplicated setattr loop.
| print("Compiled agent graph executed successfully.") | ||
| return result | ||
|
|
||
| def get_composed_observations(self, **kwargs): |
There was a problem hiding this comment.
This overrides get_composed_observations with different semantics (dict(kwargs) instead of merging prompt contents), breaking Liskov substitution. Align with AgentBase or rename the method.
| *, | ||
| object_prompt: str, | ||
| client: OpenAICompatibleClient, | ||
| max_attempts: int | None = None, |
There was a problem hiding this comment.
max_attempts=None makes the retry loop infinite when the LLM repeatedly returns invalid JSON. Set a finite default (e.g., 3–5) or require callers to pass one.
| ) | ||
|
|
||
|
|
||
| def _cleanup_output_root(output_root: Path, *, keep_path: Path | None) -> None: |
There was a problem hiding this comment.
_cleanup_output_root can delete arbitrary directory contents and follows directory symlinks outside output_root. Add path validation (ensure it's a dedicated subdirectory), skip symlinks whose targets are outside output_root, and consider requiring --allow-cleanup.
| class AgenticGenSimEnv(EmbodiedEnv): | ||
| """Config-driven agent environment for atomic-action tasks.""" | ||
|
|
||
| def __init__(self, cfg: EmbodiedEnvCfg = None, **kwargs): |
There was a problem hiding this comment.
cfg: EmbodiedEnvCfg = None is not type-safe. Use cfg: EmbodiedEnvCfg | None = None, and type reset similarly (options: dict[str, Any] | None = None) with an explicit return type.
| fail = torch.zeros_like(success) | ||
| return success, fail, {} | ||
|
|
||
| def _init_agents(self, agent_config, task_name, agent_config_path=None): |
There was a problem hiding this comment.
_init_agents indexes agent_config['Agent'], ['TaskAgent'], ['CompileAgent'] before validating that those keys exist. A malformed config raises KeyError instead of the intended ValueError. Validate keys first, then pass the section dicts.
| return filtered | ||
|
|
||
| def get_states(self): | ||
| # TODO: only support num_env = 1 for now |
There was a problem hiding this comment.
get_states calls .squeeze(0) on robot state, silently assuming num_envs == 1. Add an explicit guard that raises if num_envs != 1, or implement proper batched support.
| ) | ||
|
|
||
|
|
||
| def _object_lifted(env, spec: Mapping[str, Any]) -> torch.Tensor: |
There was a problem hiding this comment.
_object_lifted falls back to position[:, 2] as the initial height when obj_info lacks the object. That makes min_height always satisfied and causes false-positive success. Raise an error or load the true initial height from the environment config.
| if int(buffer_view.get("buffer", 0)) != 0: | ||
| raise ValueError("Only GLB embedded binary buffers are supported.") | ||
|
|
||
| stride = int(buffer_view.get("byteStride", component_size * component_count)) |
There was a problem hiding this comment.
_iter_gltf_accessor_vec3 ignores the GLTF accessor normalized flag. For normalized integer accessors, scale component values by the normalization factor or bounding-box computations will be corrupted.
| ) | ||
|
|
||
|
|
||
| def dexsim_coacd_cache_key_for_mesh( |
There was a problem hiding this comment.
Cache key uses hashlib.md5, which is disallowed by many security baselines. Prefer hashlib.sha256 (or blake2b) for new code.
| ), | ||
| ) | ||
| parser.add_argument( | ||
| "--target_replacement1", |
There was a problem hiding this comment.
target_replacement should be refactor to support 0-N objects replacement. Also, the target is defined as the foregraound intractive objects
| @@ -14,6 +14,6 @@ | |||
| # limitations under the License. | |||
| # ---------------------------------------------------------------------------- | |||
|
|
|||
There was a problem hiding this comment.
Why put so mang files under this folder? CLI usually contains only the entry points of a functionality.
| @@ -4,12 +4,12 @@ | |||
| # All rights reserved. | |||
| # ---------------------------------------------------------------------------- | |||
|
|
|||
| from typing import Dict, Optional | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
Can you still have these two envs run with agent?
| "mode": "modify", | ||
| "name": "robot/qpos", | ||
| "params": { | ||
| "joint_ids": [12, 13, 14, 15], |
There was a problem hiding this comment.
Hardcoded joint_ids: [12, 13, 14, 15] is DualUR5-specific. Derive these from the robot config so the observation setup works for other arms.
| "get_openai_compatible_llm_config", | ||
| ] | ||
|
|
||
| DEFAULT_LLM_MODEL = "gpt-4o" |
There was a problem hiding this comment.
DEFAULT_LLM_MODEL = "gpt-4o" is hardcoded. Consider loading the default from a project config or environment variable, and document how users can override it.
| DEFAULT_PIPELINE_HISTORY = ( | ||
| DEFAULT_ACTION_AGENT_WORKSPACE / "configs/pipeline_history.json" | ||
| ) | ||
| DEFAULT_TASK_NAME = "Demo3_Text" |
There was a problem hiding this comment.
DEFAULT_TASK_NAME = "Demo3_Text" is demo-specific. This belongs in a demo script or CLI default, not in the library defaults module.
| DEFAULT_ACTION_AGENT_WORKSPACE / "configs/pipeline_history.json" | ||
| ) | ||
| DEFAULT_TASK_NAME = "Demo3_Text" | ||
| DEFAULT_TASK_TEMPLATE_NAMES = frozenset({"Demo1_Text"}) |
There was a problem hiding this comment.
DEFAULT_TASK_TEMPLATE_NAMES = frozenset({"Demo1_Text"}) is also demo-specific. Move to demo-level configuration.
|
|
||
| def _container_runtime_uid(container: _SceneObject) -> str: | ||
| base = _base_name(container) | ||
| if "basket" in base: |
There was a problem hiding this comment.
Special-casing "basket" → "wicker_basket" is a hidden convention. Document it or make it configurable per task.
| "make_stacking_task_prompt", | ||
| ] | ||
|
|
||
| _BASKET_LEFT_RELEASE_OFFSET_Y = 0.04 |
There was a problem hiding this comment.
Physical constants (_BASKET_LEFT_RELEASE_OFFSET_Y = 0.04, _PLACE_LIFT_HEIGHT = 0.10, sample intervals) are scattered as module-level literals. Consider a @configclass for per-task physical parameters.
| container_config: Mapping[str, Any] | None, | ||
| ) -> list[int]: | ||
| if axis == "y": | ||
| side_order = {"left": 0, "right": 1} |
There was a problem hiding this comment.
side_order = {"left": 0, "right": 1} hardcodes the dual-arm convention. This should come from the robot config or arm-slot mapping.
| request_id: str = "prompt2geometry_asset_0" | ||
| output_name: str | None = None | ||
| zimage_base_url: str = "" | ||
| zimage_width: int = 1024 |
There was a problem hiding this comment.
Image-generation defaults (zimage_width=1024, zimage_height=1024, zimage_seed=42, zimage_num_inference_steps=8) are hardcoded. They are overrideable via the dataclass, but the defaults should be documented or moved to a service config.
| **_cfg_supported_kwargs( | ||
| AntipodalSamplerCfg, | ||
| { | ||
| "n_sample": int(runtime_kwargs.get("grasp_antipodal_n_sample", 20000)), |
There was a problem hiding this comment.
Grasp sampling defaults (n_sample=20000, max_length=0.088, min_length=0.003, viser_port=11801) are inline. While some are overrideable via runtime_kwargs, the defaults should be centralized in a config object.
| ) | ||
| z_offset = object_position[:, 2] - container_position[:, 2] | ||
| return ( | ||
| (xy_distance <= float(spec.get("xy_radius", spec.get("radius", 0.1)))) |
There was a problem hiding this comment.
Success tolerances (xy_radius default 0.1, min_z_offset -0.03, max_z_offset 0.25) are hardcoded. These should be task-configurable.
Description
Add the action-agent runtime and generated-config pipeline for image-to-tabletop manipulation demos.
This PR introduces the reviewable vertical slice for running generated simulation tasks end to end:
TaskAgentand deterministicCompileAgentwrappers for nominal task-graph generation, validation, and execution.embodichain.lab.sim.atomic_actions.run_agent.pyCLI for executing an existing generatedgym_config+agent_configpair.Reviewer Cleanup
Addressed the review feedback around stale and invalid action-agent code:
orientationfrom target-pose schema and added regression coverage that rejects it._build_action_cfg_and_start()helper by inlining cfg/start-qpos construction.Fixes # N/A
Type of change
Screenshots
N/A. This PR adds runtime, config generation, and CLI infrastructure without UI changes.
Validation
Result: all listed files were already formatted.
Result:
20 passed.Additional smoke checks:
CompileAgent(prompt_name="compile_agent_graph", ...).orientationis rejected as an unsupported target-pose field.Checklist